Exploration of Projection Spaces¶

Feature Visualization¶

Dependencies¶

In [16]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from umap import UMAP
import seaborn as sns
import plotly.express as px

import pandas as pd
import plotly.graph_objs as go

import warnings
import warnings

warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=UserWarning)

Import the Data¶

The dataset used is an image dataset used for microscopy image classification. The dataset is acquired from a previous course, AI in Life Sciences. The dataset consists of 9632 training images with 9 classes. However, only 7890 images are used for training and the rest is used for testing. The goal is to analyze the learning process of the model and to visualize the features. The Deep Learning (DL) training algorithms and its influence on the explainability of neural network models are investigated. The overall aim is to visualize flow of information within the deep NN using factors that can be interpreted by humans, even if the underlying model uses more complex factors, which enables generation of human interpretable explanations.

In this data, we visualize the learning process of the model. The inter-epoch trajectory of the model learning is the main focus. Initially, the network has random weights and learns features from data. As the epochs proceed, the model learns and is able to use the learned features and use the updated weights to better the classification results. The model hidden layer features are visualized and should be seen with better and more obvious formation of clusters.

To get a better understand how the features.npz file was created, please refer to the Train_Model.ipynb notebook. In general, the features.npz file contains the features of the hidden layer of the model. The features are extracted from the model after each epoch. The features are then saved in the features.npz file. The features.npz file is then used to visualize the learning process of the model.

In [2]:
# load the data
data = np.load(r"./data/features.npz")["arr_0"]
labels = np.load(r"./data/labels.npy")
labels = labels.tolist()

We will already add some meta data to the data. The meta data is the number of classes. The number of classes is 9. The meta data will help us to visualize the data and to better understand the data. We will define uniqe colors for each class. The colors will be used to visualize the data. In the end, the colors are devided by 255 to get the colors in the range of 0 to 1.

In [3]:
colors_per_class = {
    "A549": [254, 202, 87],
    "CACO-2": [255, 107, 107],
    "HEK 293": [10, 189, 227],
    "HeLa": [255, 159, 243],
    "MCF7": [16, 172, 132],
    "PC-3": [128, 80, 128],
    "RT4": [87, 101, 116],
    "U-2 OS": [52, 31, 151],
    "U-251 MG": [100, 100, 255],
}
colors = [colors_per_class[label] for label in labels]
colors = np.array(colors) / 255

Projection¶

For the downprojection we used PCA, T-SNE and UMAP. The projection is done on the features of the hidden layer of the model. The projection is just to get a better understanding of the data and also which of the projection methods is the best for the data. For now the projection was done in standard settings.

PCA¶

In [4]:
# plot the data
pca = PCA(n_components=2)
fig, ax = plt.subplots(2, 5, figsize=(15, 10))
for i in range(2):
    for j in range(5):
        data_2d = pca.fit_transform(data[i * 5 + j])
        ax[i, j].scatter(data_2d[:, 0], data_2d[:, 1], s=1, c=colors)
        ax[i, j].set_title("Epoch {}".format(i * 5 + j))

plt.show()

Observations: We can see that PCA did not seprate the ponits the much. The points are still very close to each other. The PCA projection is not very good for the data. However, the sepration in epoch 7 is better than the other epochs. What one can see is the in the first epoch the points are very close to each other. As the epochs proceed, the points are more seprated.

t-SNE¶

In [5]:
fig, ax = plt.subplots(2, 5, figsize=(15, 10))
tsne = TSNE(n_components=2, verbose=0, n_jobs=-1)
for i in range(2):
    for j in range(5):
        data_2d = tsne.fit_transform(data[i * 5 + j])
        ax[i, j].scatter(data_2d[:, 0], data_2d[:, 1], s=1, c=colors)
        ax[i, j].set_title("Epoch {}".format(i * 5 + j))
plt.show()

Observations: With t-SNEwe can see that the points are more seprated than with PCA. Here the results very good. Getting the results with t-SNE which where expected. The points are more seprated and the sepration is better than with PCA. One can clearly see that the points in the first epoch are very close to each other. As the epochs proceed, the points are more seprated.

UMAP¶

In [6]:
# plot the data using UMAP
fig, ax = plt.subplots(2, 5, figsize=(15, 10))
umap = UMAP(n_components=2, verbose=0, n_jobs=-1)
for i in range(2):
    for j in range(5):
        data_2d = umap.fit_transform(data[i * 5 + j])
        ax[i, j].scatter(data_2d[:, 0], data_2d[:, 1], s=1, c=colors)
        ax[i, j].set_title("Epoch {}".format(i * 5 + j))
plt.show()

Observations: UMAP also worked out very well. The points are more seprated than with PCA and t-SNE. However, ther are some classes not seprated very well in the last epoch compare to t-SNE.

Bring it all together¶

Here the all the epochs where ploted into one big plot.

In [6]:
# plot the data
fig, ax = plt.subplots(figsize=(15, 10))
for i in range(10):
    tsne = TSNE(n_components=2, init="pca", verbose=0, n_jobs=-1)
    data_2d = tsne.fit_transform(data[i])
    ax.scatter(data_2d[:, 0], data_2d[:, 1], s=1, c=colors)
plt.show()
2022-11-09T12:01:57.410368 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
In [7]:
# plot the data
fig, ax = plt.subplots(figsize=(15, 10))
tsne = TSNE(n_components=2, init="pca", verbose=0, n_jobs=-1)
for i in range(10):
    data_2d = tsne.fit_transform(data[i])
    ax.scatter(data_2d[:, 0], data_2d[:, 1], s=1, label="Epoch {}".format(i))
    ax.plot(data_2d[:, 0], data_2d[:, 1], alpha=0.09)
ax.legend()
plt.show()
2022-11-09T12:02:36.261277 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

There is not a lot of infomation read out from thw lot. The is a fix for it later in the notebook. But one can see that there are a lot of points in the middel and for the last epoch there are a lot of points in the corners.

t-SNE and UMAP with different metrics¶

t-SNE with different metrics¶

To finde the best metrice for the t-SNE we run over three different metrics. The metrics are euclidean, manhattan and hamming. The results are shown in the following plots. The results from euclidean and manhattan are not very different. The best results are with the euclidean metric. The results with the hamming metric are very bad.

In [10]:
TSNE_METRICS = ["euclidean", "manhattan", "hamming"]

for metric in TSNE_METRICS:
    fig, ax = plt.subplots(2, 5, figsize=(15, 10))
    for i in range(2):
        for j in range(5):
            tsne = TSNE(n_components=2, init="pca", verbose=0, n_jobs=-1, metric=metric)
            data_2d = tsne.fit_transform(data[i * 5 + j])
            ax[i, j].scatter(data_2d[:, 0], data_2d[:, 1], s=1, c=colors)
            ax[i, j].set_title("Epoch {}".format(i * 5 + j))
    fig.suptitle("t-SNE with metric {}".format(metric))
    plt.show()

UMAP with different metrics¶

To finde the best metrice for the UMAP we run over 4 different metrics. The metrics are euclidean, hamming, manhattan and correlation.

In [6]:
UMAP_METRICS = [
    "euclidean",
    "hamming",
    "manhattan",
    "correlation",
]

for metric in UMAP_METRICS:
    fig, ax = plt.subplots(2, 5, figsize=(15, 10))
    umap = UMAP(n_components=2, metric=metric, verbose=0, n_jobs=-1)
    for i in range(2):
        for j in range(5):
            data_2d = umap.fit_transform(data[i * 5 + j])
            ax[i, j].scatter(data_2d[:, 0], data_2d[:, 1], s=1, c=colors)
            ax[i, j].set_title("Epoch {}".format(i * 5 + j, metric))
    fig.suptitle("UMAP with metric {}".format(metric))
    plt.show()
2022-11-08T14:11:41.566448 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-11-08T14:12:41.105764 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-11-08T14:13:18.064705 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-11-08T14:14:06.532486 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Observations: As one can see that almost all the metrics give the same results. The good results are with the euclidean, manhattan, correlation metric. The results with the hamming metric are very bad.

Brining the data in the table format¶

Till now we where using the 3d numpy array to plot the data. Now we will bring the data in the table format. For that we will over every epoch and every image and save the hidden weights which are 1280 in the table. The table will have 1280 columns and the number of rows will be the number of images.

In [7]:
def generate_tabel_data(data):
    tabel_data = []
    for epoch in range(data.shape[0]):
        for image in range(data.shape[1]):
            tabel_data.append([epoch, image, *data[epoch, image]])
    return tabel_data
In [8]:
tabel_data = generate_tabel_data(data)
import pandas as pd

df = pd.DataFrame(
    tabel_data,
    columns=["epoch", "image", *["x_{}".format(i) for i in range(data.shape[2])]],
)
df["label"] = labels * 10
In [9]:
df
Out[9]:
epoch image x_0 x_1 x_2 x_3 x_4 x_5 x_6 x_7 ... x_1271 x_1272 x_1273 x_1274 x_1275 x_1276 x_1277 x_1278 x_1279 label
0 0 0 -0.061640 0.669427 -0.037390 -0.072912 0.177839 -0.117206 0.195436 0.039985 ... 0.015348 0.738867 0.462114 -0.096684 -0.084439 0.376865 0.006146 0.499843 0.132847 U-2 OS
1 0 1 -0.174966 -0.015820 -0.057514 0.045858 0.007711 0.070376 0.135525 1.686104 ... 0.009815 -0.129545 -0.144017 -0.123344 1.071622 0.496978 -0.097275 -0.057270 -0.029485 CACO-2
2 0 2 -0.143236 0.192292 -0.024325 0.146251 -0.117585 -0.097257 -0.029092 0.468650 ... -0.003601 0.068411 -0.039624 0.016348 0.345940 0.581720 -0.153233 -0.116875 -0.026813 HeLa
3 0 3 0.172030 1.869631 0.315535 -0.115844 0.644933 -0.177177 -0.084760 -0.005555 ... -0.004633 0.210583 -0.106762 0.399959 0.495958 0.392284 0.268443 0.297449 -0.048701 CACO-2
4 0 4 -0.129526 0.141332 -0.056453 0.363036 -0.175806 -0.139477 -0.146133 -0.091358 ... 1.038976 0.721481 -0.086710 -0.083307 0.307861 0.017224 -0.127287 0.325065 -0.097920 MCF7
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17415 9 1737 -0.145046 -0.128737 -0.112647 0.142011 1.271666 -0.239937 -0.147267 0.206571 ... 0.059964 -0.173431 -0.154945 -0.042000 -0.135171 -0.009146 0.363955 -0.188163 -0.091321 PC-3
17416 9 1738 2.309876 -0.107943 0.096737 0.388151 -0.094068 0.135940 1.428808 0.396734 ... -0.114613 0.117047 0.073088 -0.030453 0.216468 0.138633 1.105582 -0.129447 1.259154 HEK 293
17417 9 1739 -0.109858 -0.120113 2.313648 1.052860 0.306034 0.665497 -0.037204 0.915989 ... 0.494094 -0.126748 -0.015180 -0.131770 0.307820 -0.096703 -0.070606 -0.128338 0.676635 RT4
17418 9 1740 -0.166764 -0.079054 -0.100959 -0.092999 0.723891 -0.145387 0.248314 0.729864 ... 0.588329 1.497055 -0.054499 0.220291 -0.073841 -0.063293 -0.061063 -0.122453 -0.083355 PC-3
17419 9 1741 -0.180064 0.002528 -0.110202 0.128322 1.424868 -0.206283 -0.096041 0.099965 ... 0.156708 0.045574 -0.079024 0.390233 -0.107368 -0.099926 -0.159344 -0.133758 -0.077656 PC-3

17420 rows × 1283 columns

Add the meta data to the data¶

In [35]:
sns.set(rc={"figure.figsize": (20, 15)})
fig, ax = plt.subplots(2, 5, figsize=(20, 15))
for i in range(2):
    for j in range(5):
        sns.scatterplot(
            x="x",
            y="y",
            hue="label",
            data=df[df["epoch"] == i * 5 + j],
            legend="full",
            ax=ax[i, j],
        )
        ax[i, j].set_title("Epoch {}".format(i * 5 + j))
plt.show()
2022-11-07T21:01:25.771199 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/

Observations: From the above plots, starting from epoch 5, the model features are obviously moved into clusters which indicates the model converges and its ability to generalize on the Unseen dataset

Generate one table for all downprojection methods and metrics¶

In [ ]:
tsne = UMAP(n_components=2, verbose=0, n_jobs=-1)
data_2d = tsne.fit_transform(df.drop(["epoch", "image", "label"], axis=1))
df["x"] = data_2d[:, 0]
df["y"] = data_2d[:, 1]
fig = px.scatter(
    df, x="x", y="y", color="label", title="t-SNE with metric {}".format(metric)
)
fig.show()
In [10]:
pca = PCA(n_components=2)
data_2d = pca.fit_transform(df.drop(["epoch", "image", "label"], axis=1))
df["x_pca"] = data_2d[:, 0]
df["y_pca"] = data_2d[:, 1]
df["dp_method"] = "pca"
In [25]:
TSNE_METRICS = ["euclidean", "manhattan"]
for metric in TSNE_METRICS:
    tsne = TSNE(n_components=2, verbose=0, n_jobs=-1, metric=metric)
    data_2d = tsne.fit_transform(df.drop(["epoch", "image", "label", "x_pca","y_pca"], axis=1))
    df["x_tsne_{}".format(metric)] = data_2d[:, 0]
    df["y_tsne_{}".format(metric)] = data_2d[:, 1]
In [28]:
UMAP_METRICS = [
    "euclidean",
    "hamming",
    "manhattan",
    "correlation",
]
for metric in UMAP_METRICS:
    umap = UMAP(n_components=2, verbose=0, n_jobs=-1, metric=metric)
    data_2d = umap.fit_transform(df.drop(["epoch", "image", "label", "x_pca","y_pca", "x_tsne_euclidean", "y_tsne_euclidean", "x_tsne_manhattan", "y_tsne_manhattan"], axis=1))
    df["x_umap_{}".format(metric)] = data_2d[:, 0]
    df["y_umap_{}".format(metric)] = data_2d[:, 1]
    
In [29]:
df
Out[29]:
epoch image x_0 x_1 x_2 x_3 x_4 x_5 x_6 x_7 ... x_tsne_manhattan y_tsne_manhattan x_umap_euclidean y_umap_euclidean x_umap_hamming y_umap_hamming x_umap_manhattan y_umap_manhattan x_umap_correlation y_umap_correlation
0 0 0 -0.061640 0.669427 -0.037390 -0.072912 0.177839 -0.117206 0.195436 0.039985 ... -10.051302 21.774466 8.006492 6.078372 9.615277 2.454584 1.377528 8.469635 5.849822 2.634155
1 0 1 -0.174966 -0.015820 -0.057514 0.045858 0.007711 0.070376 0.135525 1.686104 ... -29.252380 -2.116282 8.482591 5.685714 10.234534 3.452316 0.627635 8.200347 6.781205 1.870595
2 0 2 -0.143236 0.192292 -0.024325 0.146251 -0.117585 -0.097257 -0.029092 0.468650 ... -0.650872 25.010033 8.575643 6.443521 10.611340 2.625251 0.978865 8.912443 6.134972 2.466205
3 0 3 0.172030 1.869631 0.315535 -0.115844 0.644933 -0.177177 -0.084760 -0.005555 ... -12.221191 13.205143 7.843828 6.295854 8.878566 2.462410 1.528608 8.757740 6.337809 3.016420
4 0 4 -0.129526 0.141332 -0.056453 0.363036 -0.175806 -0.139477 -0.146133 -0.091358 ... -20.844439 34.261242 8.581761 5.341557 10.086728 4.175112 0.822732 7.705692 6.097578 1.592030
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17415 9 1737 -0.145046 -0.128737 -0.112647 0.142011 1.271666 -0.239937 -0.147267 0.206571 ... 18.987158 88.912521 14.005446 4.747704 14.364185 5.400997 -4.578933 7.493397 6.005784 -2.288573
17416 9 1738 2.309876 -0.107943 0.096737 0.388151 -0.094068 0.135940 1.428808 0.396734 ... -25.133770 -47.618355 3.087091 8.515288 6.055699 0.040828 7.544350 10.324475 6.698108 11.865839
17417 9 1739 -0.109858 -0.120113 2.313648 1.052860 0.306034 0.665497 -0.037204 0.915989 ... 47.409962 -34.578732 0.629738 4.521232 -0.554824 2.005620 10.426287 7.917491 7.289971 14.011857
17418 9 1740 -0.166764 -0.079054 -0.100959 -0.092999 0.723891 -0.145387 0.248314 0.729864 ... -4.563132 75.327049 11.351472 4.740134 12.162279 4.815999 -2.085558 6.704915 8.612834 -1.724663
17419 9 1741 -0.180064 0.002528 -0.110202 0.128322 1.424868 -0.206283 -0.096041 0.099965 ... 12.599486 78.174316 13.844024 5.231505 13.941362 4.533796 -4.492154 8.145507 6.417513 -2.590778

17420 rows × 1297 columns

In [13]:
# save the dataframe to a csv file
# df.to_csv("data.csv", index=False)
# load the dataframe from the csv file
# df = pd.read_csv(r"data.csv")

Animated Plot of the Data using PCA, t-SNE, and UMAP¶

Animation of PCA¶

In [20]:
fig = px.scatter(
    df,
    x="x_pca",
    y="y_pca",
    color="label",
    animation_frame="epoch",
    animation_group="image",
    range_x=[-15, 25],
    range_y=[-15, 25],
    title="Animation of PCA over epochs",
)
# change size of the plot larger
fig.update_layout(
    width=1000,
    height=1000,
)
fig.show()

Observations: PCA is a linear dimension reduction technique that seeks to maximize variance and preserves large pairwise distances. This was seen in our dataset when different classes ended up far apart. However, this way of reducing dimensionality may lead to poor visualization when dealing with non-linear manifold structures; thus, other dimensionality reduction methods were investigated.

Animation of t-SNE¶

In [21]:
fig = px.scatter(
    df,
    x="x_tsne_euclidean",
    y="y_tsne_euclidean",
    color="label",
    animation_frame="epoch",
    animation_group="image",
    range_x=[-120, 120],
    range_y=[-120, 120],
    title="Animation of t-SNE with matric euclidean over epochs",
)
# change size of the plot larger
fig.update_layout(
    width=1000,
    height=1000,
)
fig.show()
In [22]:
fig = px.scatter(
    df,
    x="x_tsne_manhattan",
    y="y_tsne_manhattan",
    color="label",
    animation_frame="epoch",
    animation_group="image",
    range_x=[-120, 120],
    range_y=[-120, 120],
    title="Animation of t-SNE with matric manhattan over epochs",
)
# change size of the plot larger
fig.update_layout(
    width=1000,
    height=1000,
)
fig.show()

Observations: t-SNE differs from PCA by preserving only small pairwise distances or local similarities whereas PCA is concerned with preserving large pairwise distances to maximize variance. This is seen in our plots with the clusters clearly further away than in PCA.

Animation pf UMAP¶

In [23]:
fig = px.scatter(
    df,
    x="x_umap_euclidean",
    y="y_umap_euclidean",
    color="label",
    animation_frame="epoch",
    animation_group="image",
    range_x=[-10, 20],
    range_y=[-5, 20],
    title="Animation of UMAP with matric euclidean over epochs",
)
# change size of the plot larger
fig.update_layout(
    width=1000,
    height=1000,
)
fig.show()
In [24]:
fig = px.scatter(
    df,
    x="x_umap_hamming",
    y="y_umap_hamming",
    color="label",
    animation_frame="epoch",
    animation_group="image",
    range_x=[-5, 20],
    range_y=[-5, 10],
    title="Animation of UMAP with matric hamming over epochs",
)
# change size of the plot larger
fig.update_layout(
    width=1000,
    height=1000,
)
fig.show()
In [25]:
fig = px.scatter(
    df,
    x="x_umap_correlation",
    y="y_umap_correlation",
    color="label",
    animation_frame="epoch",
    animation_group="image",
    range_x=[-10, 20],
    range_y=[-15, 25],
    title="Animation of UMAP with matric correlation over epochs",
)
# change size of the plot larger
fig.update_layout(
    width=1000,
    height=1000,
)
fig.show()

Observations: UMAP is another investigated dimension reduction technique that can be used for visualization similarly to t-SNE, but also for general non-linear dimension reduction. The manifold was modeled with a fuzzy topological structure. The UMAP algorithm is competitive with t-SNE for visualization quality, and preserves more of the global structure with superior run time performance.

Connecting the dots¶

In [34]:
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=df[df["epoch"] == 0]["x_tsne_manhattan"],
        y=df[df["epoch"] == 0]["y_tsne_manhattan"],
        mode="markers",
        name="epoch 0",
        marker=dict(
            size=6,
            color="red",
            symbol="circle",
        ),
    )
)
fig.add_trace(
    go.Scatter(
        x=df[df["epoch"] == 9]["x_tsne_manhattan"],
        y=df[df["epoch"] == 9]["y_tsne_manhattan"],
        mode="markers",
        name="epoch 9",
        marker=dict(
            size=6,
            color="blue",
            symbol="square",
        ),
    )
)
for i in range(len(df[df["epoch"] == 0])):
    fig.add_trace(
        go.Scatter(
            x=[
                df[df["epoch"] == 0]["x_tsne_manhattan"].iloc[i],
                df[df["epoch"] == 9]["x_tsne_manhattan"].iloc[i],
            ],
            y=[
                df[df["epoch"] == 0]["y_tsne_manhattan"].iloc[i],
                df[df["epoch"] == 9]["y_tsne_manhattan"].iloc[i],
            ],
            mode="lines",
            line=dict(width=.5, color="black"),
            showlegend=False,
        )
    )
fig.update_layout(
    #label axis
    xaxis_title="x_tsne_manhattan",
    yaxis_title="y_tsne_manhattan",
    width=1000,
    height=1000,
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=12,
            color="black",
        ),
        bgcolor="LightSteelBlue",
        bordercolor="Black",
        borderwidth=2,
    ),
)
# set title
fig.update_layout(title_text="t-SNE with connected points")
fig.show()
In [35]:
fig = go.Figure()
for label in df["label"].unique():
    fig.add_trace(
        go.Scatter(
            x=df[df["label"] == label]["x_tsne_manhattan"],
            y=df[df["label"] == label]["y_tsne_manhattan"],
            mode="lines",
            name=label,
            line=dict(width=0.3),
        )
    )
fig.update_layout(
    #label axis
    xaxis_title="x_tsne_manhattan",
    yaxis_title="y_tsne_manhattan",
    width=1000,
    height=1000,
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=12,
            color="black",
        ),
        bgcolor="LightSteelBlue",
        bordercolor="Black",
        borderwidth=2,
    ),
)
fig.update_layout(title_text="t-SNE with matric manhattan linking the states")
fig.show()
In [36]:
fig = go.Figure()
for label in df["label"].unique():
    fig.add_trace(
        go.Scatter(
            x=df[df["label"] == label]["x_tsne_euclidean"],
            y=df[df["label"] == label]["y_tsne_euclidean"],
            mode="lines",
            name=label,
            line=dict(width=0.4),
        )
    )
fig.update_layout(
    #label axis
    xaxis_title="x_tsne_euclidean",
    yaxis_title="y_tsne_euclidean",
    width=1000,
    height=1000,
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=12,
            color="black",
        ),
        bgcolor="LightSteelBlue",
        bordercolor="Black",
        borderwidth=2,
    ),
)
fig.update_layout(title_text="t-SNE with matric euclidean linking the states")
fig.show()
In [37]:
fig = go.Figure()
for label in df["label"].unique():
    fig.add_trace(
        go.Scatter(
            x=df[df["label"] == label]["x_umap_euclidean"],
            y=df[df["label"] == label]["y_umap_euclidean"],
            mode="lines",
            name=label,
            line=dict(width=0.35),
        )
    )
fig.update_layout(
    #label axis
    xaxis_title="x_umap_euclidean",
    yaxis_title="y_umap_euclidean",
    width=1000,
    height=1000,
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=12,
            color="black",
        ),
        bgcolor="LightSteelBlue",
        bordercolor="Black",
    ),
)
fig.update_layout(title_text="UMAP with matric euclidean linking the states")
fig.show()

Observations: Here we can see which path the points take. In the middle, you see a lot of movement after that they move around a lot more outside.

In [ ]: